63 research outputs found

    Population-level Balance in Signed Networks

    Full text link
    Statistical network models are useful for understanding the underlying formation mechanism and characteristics of complex networks. However, statistical models for \textit{signed networks} have been largely unexplored. In signed networks, there exist both positive (e.g., like, trust) and negative (e.g., dislike, distrust) edges, which are commonly seen in real-world scenarios. The positive and negative edges in signed networks lead to unique structural patterns, which pose challenges for statistical modeling. In this paper, we introduce a statistically principled latent space approach for modeling signed networks and accommodating the well-known \textit{balance theory}, i.e., ``the enemy of my enemy is my friend'' and ``the friend of my friend is my friend''. The proposed approach treats both edges and their signs as random variables, and characterizes the balance theory with a novel and natural notion of population-level balance. This approach guides us towards building a class of balanced inner-product models, and towards developing scalable algorithms via projected gradient descent to estimate the latent variables. We also establish non-asymptotic error rates for the estimates, which are further verified through simulation studies. In addition, we apply the proposed approach to an international relation network, which provides an informative and interpretable model-based visualization of countries during World War II

    Semi‐supervised joint learning for longitudinal clinical events classification using neural network models

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163377/2/sta4305.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163377/1/sta4305_am.pd

    The fermentation optimization for alkaline protease production by Bacillus subtilis BS-QR-052

    Get PDF
    IntroductionProteases exhibit a wide range of applications, and among them, alkaline proteases have become a prominent area of research due to their stability in highly alkaline environments. To optimize the production yield and activity of alkaline proteases, researchers are continuously exploring different fermentation conditions and culture medium components.MethodsIn this paper, the fermentation conditions of the alkaline protease (EC 3.4.21.14) production by Bacillus subtilis BS-QR-052 were optimized, and the effect of different nutrition and fermentation conditions was investigated. Based on the single-variable experiments, the Plackett–Burman design was used to explore the significant factors, and then the optimized fermentation conditions, as well as the interaction between these factors, were evaluated by response surface methodology through the Box–Behnken design.Results and discussionThe results showed that 1.03% corn syrup powder, 0.05% MgSO4, 8.02% inoculation volume, 1:1.22 vvm airflow rate, as well as 0.5% corn starch, 0.05% MnSO4, 180 rpm agitation speed, 36°C fermentation temperature, 8.0 initial pH and 96 h incubation time were predicted to be the optimal fermentation conditions. The alkaline protease enzyme activity was estimated to be approximately 1787.91 U/mL, whereas subsequent experimental validation confirmed it reached 1780.03 U/mL, while that of 500 L scale-up fermentation reached 1798.33 U/mL. This study optimized the fermentation conditions for alkaline protease production by B. subtilis through systematic experimental design and data analysis, and the activity of the alkaline protease increased to 300.72% of its original level. The established model for predicting alkaline protease activity was validated, achieving significantly higher levels of enzymatic activity. The findings provide valuable references for further enhancing the yield and activity of alkaline protease, thereby holding substantial practical significance and economic benefits for industrial applications

    Metastatic patterns and prognosis of patients with primary malignant cardiac tumor

    Get PDF
    BackgroundDistant metastases are independent negative prognostic factors for patients with primary malignant cardiac tumors (PMCT). This study aims to further investigate metastatic patterns and their prognostic effects in patients with PMCT.Materials and methodsThis multicenter retrospective study included 218 patients with PMCT diagnosed between 2010 and 2017 from Surveillance, Epidemiology, and End Results (SEER) database. Logistic regression was utilized to identify metastatic risk factors. A Chi-square test was performed to assess the metastatic rate. Kaplan–Meier methods and Cox regression analysis were used to analyze the prognostic effects of metastatic patterns.ResultsSarcoma (p = 0.002) and tumor size¿4 cm (p = 0.006) were independent risk factors of distant metastases in patients with PMCT. Single lung metastasis (about 34%) was the most common of all metastatic patterns, and lung metastases occurred more frequently (17.9%) than bone, liver, and brain. Brain metastases had worst overall survival (OS) and cancer-specific survival (CSS) among other metastases, like lung, bone, liver, and brain (OS: HR = 3.20, 95% CI: 1.02–10.00, p = 0.046; CSS: HR = 3.53, 95% CI: 1.09–11.47, p = 0.036).ConclusionPatients with PMCT who had sarcoma or a tumor larger than 4 cm had a higher risk of distant metastases. Lung was the most common metastatic site, and brain metastases had worst survival among others, such as lung, bone, liver, and brain. The results of this study provide insight for early detection, diagnosis, and treatment of distant metastases associated with PMCT

    Statistical Learning for Large-Scale and Complex-Structured Data

    Full text link
    Our modern era has seen an explosion in the amount of valuable information stored in large and complex datasets. The growing scale, diversity of data structures, and incomplete observations in these datasets pose new challenges for statistical learning. Motivated by these challenges, this dissertation addresses three important problems below. (I) The first part of the dissertation presents how ordinary differential equations (ODE) can be novelly used to enhance modeling flexibility and computational efficiency in survival analysis for complex and incomplete censored data. Despite rich literature on survival analysis, most existing statistical models and estimation methods still suffer from practical limitations such as restricted model capacity and a lack of scalability for large-scale studies. We introduce a unified ODE framework for survival analysis that allows flexible modeling and enables a statistically efficient procedure for estimation and inference. In particular, the proposed estimation procedure is computationally efficient, easy-to-implement, and applicable to a wide range of survival models. Moreover, to accommodate data in diverse formats, we extend the ODE framework by leveraging deep neural networks for powerful prediction. (II) The second part of the dissertation focuses on statistical models for signed networks. Statistical network models are useful for understanding the underlying formation mechanism and characteristics of complex networks. However, statistical models for signed networks have been largely unexplored. In signed networks, there exist both positive (e.g., like, trust) and negative (e.g., dislike, distrust) edges, which are commonly seen in real-world scenarios. The positive and negative edges in signed networks lead to unique structural patterns, which pose challenges for statistical modeling. In this part, we introduce a novel latent space approach for modeling signed networks and accommodating the well-known balance theory in social science, i.e., "the enemy of my enemy is my friend" and "the friend of my friend is my friend". The proposed approach treats both edges and their signs as random variables, and characterizes the balance theory with a novel and natural notion of population-level balance. This approach guides us towards building a class of balanced inner-product models, and towards developing scalable algorithms via projected gradient descent to estimate the latent variables. We also establish non-asymptotic error rates for the estimates. (III) The third part of the dissertation focuses on applications of statistical machine learning to healthcare. In particular, quick and accurate prediction of disease progression can provide valuable information for clinicians to provide appropriate care in a timely manner. The success of prediction models often relies on the availability of a large number of labeled training data. However, in many healthcare settings, only a small minority of available data is accurately labeled while unlabeled data is abundant. Further, input variables such as clinical events in the medical records are usually of a complex, longitudinal nature, which poses additional challenges. Motivated by the scarcity of annotated data, we propose a new semi-supervised joint learning method for classifying clinical events data, which requires fewer labeled training data while maintaining the same prediction performance when compared to the supervised method.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/174618/1/weijtang_1.pd

    KL-divergence Based Deep Learning for Discrete Time Model

    Full text link
    Neural Network (Deep Learning) is a modern model in Artificial Intelligence and it has been exploited in Survival Analysis. Although several improvements have been shown by previous works, training an excellent deep learning model requires a huge amount of data, which may not hold in practice. To address this challenge, we develop a Kullback-Leibler-based (KL) deep learning procedure to integrate external survival prediction models with newly collected time-to-event data. Time-dependent KL discrimination information is utilized to measure the discrepancy between the external and internal data. This is the first work considering using prior information to deal with short data problem in Survival Analysis for deep learning. Simulation and real data results show that the proposed model achieves better performance and higher robustness compared with previous works.Comment: This paper is not complete and the results are not qualified to be public. Therefore we decided to withdraw the paper and plan to submit a newer version in the futur
    corecore